Parallel meta-blocking for scaling entity resolution over big heterogeneous data

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel meta-blocking for scaling entity resolution over big heterogeneous data

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. In order to enable entity resolution to scale to large volumes of data, blocking is typically employed: it clusters similar entities into (overlapping) blocks so that it suffices to perform comparisons only within each block. To further increase efficiency, Meta-blocking is being used...

متن کامل

Scaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking

Entity Resolution constitutes a quadratic task that typically scales to large entity collections through blocking. The resulting blocks can be restructured by Meta-blocking in order to significantly increase precision at a limited cost in recall. Yet, its processing can be time-consuming, while its precision remains poor for configurations with high recall. In this work, we propose new meta-blo...

متن کامل

Entity Resolution in a Big Data Framework

Resource Description Framework (RDF)1 is a data model that can be used to publish semistructured data visualized as directed graphs. An example is Dataset 1 in Fig. 1. Nodes in the graph represent entities and edges represent properties connecting these entities. Two nodes may refer to the same logical entity, despite being syntactically disparate. For example, the entity Mickey Beats in Datase...

متن کامل

Top-K Entity Units Retrieval Over Big Data

During the past several years, data size has increased explosively. This data explosion tendency has impacted various fields ranging from biomedical engineering, business consulting to social media and mobile application. Big Data is a two sided sword. While it provides incredibly treasured insights in commercial scope and innovative discovery in the scientific field, Big Data also has many cha...

متن کامل

Scaling Security for Big, Parallel File Systems

The need for petaand exabyte scale parallel file systems that support high-performance computing (HPC) has been rapidly increasing. These systems have unique demands, different from those of traditional distributed file systems. As a result, securing I/O in big, parallel file systems without significantly impacting performance has proven challenging. Parallel file systems are commonly composed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Systems

سال: 2017

ISSN: 0306-4379

DOI: 10.1016/j.is.2016.12.001